A Quasi-polynomial-time Algorithm for Sampling Words from a Context-free Language 1 Problem Speciication and History

نویسندگان

  • Vivek Gore
  • Mark Jerrum
چکیده

A quasi-polynomial-time algorithm is presented for sampling almost uniformly at random from the n-slice of the language L(G) generated by an arbitrary context-free grammar G. (The n-slice of a language L over an alphabet is the subset L\ n of words of length exactly n.) The time complexity of the algorithm is " ?2 (n jGj) O(log n) , where the parameter " bounds the variation of the output distribution from uniform, and jGj is a natural measure of the size of grammar G. The algorithm applies to a class of language sampling problems that includes slices of context-free languages as a proper subclass. We address the problem of sampling (almost) uniformly at random a word of length n from a context-free language L, and the related problem of estimating the number of words of length n in L. Ideally, we would like to obtain a sampling procedure that runs in time polynomial in the length n and the size of the grammar used to specify L. This problem has been considered by many authors (for example, Mairson 7]), who have proposed eecient solutions based on dynamic programming, but always restricted to the special case of unambiguous grammars. No polynomial time algorithm has been proposed for general context-free grammars. Let G be a context-free grammar generating the language L = L(G) , and n a positive integer. The n-slice of L is just the subset L \ n containing all words in L of length n. The problem of determining the size of the n-slice of L(G) is #P-complete 1], and remains so even when the grammar G is restricted to be regular. (The latter claim can be established by reduction from #DNF 3].) This completeness result does not, however, rule out the possibility of eecient sampling from slices of context-free languages, nor eecient estimation of the size of slices. Kannan, Sweedyk and Mahaney 3] recently presented a \quasi-polynomial-time" (i.e., with running time exp(polylog(jGj; n))) algorithm for the case of a regular grammar G. In this article we extend their result to arbitrary context free grammars G. In fact, we operate within the more general setting of \ff; g-programs" which contains context free grammars as a special case. (See Theorem 1 for an exact statement of the main result.) The main techniques we use are Karp-Luby sampling from a union of sets 5] as extended by Kannan et al. 3], …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Quasi-Polynomial-Time Algorithm for Sampling Words from a Context-Free Language

A quasi-polynomial-time algorithm is presented for sampling almost uniformly at random from the n-slice of the language L(G) generated by an arbitrary context-free grammar G. (The n-slice of a language L over an alphabet is the subset L\ n of words of length exactly n.) The time complexity of the algorithm is " ?2 (n jGj) O(log n) , where the parameter " bounds the variation of the output distr...

متن کامل

The Inclusion Problem of Context-Free Languages: Some Tractable Cases

We study the problem of testing whether a context-free language is included in a fixed set L0, where L0 is the language of words reducing to the empty word in the monoid defined by a complete string rewrite system. We prove that, if the monoid is cancellative, then our inclusion problem is polynomially reducible to the problem of testing equivalence of straight-line programs in the same monoid....

متن کامل

Towards Automating Grammar Equivalence Checking

We consider from practical perspective the (generally undecidable) problem of checking equivalence of context-free grammars. We present both techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key building blocks of our approach is a novel algorithm for efficiently enumerating and sampling words and parse trees from ar...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

A POLYNOMIAL TIME BRANCH AND BOUND ALGORITHM FOR THE SINGLE ITEM ECONOMIC LOT SIZING PROBLEM WITH ALL UNITS DISCOUNT AND RESALE

The purpose of this paper is to present a polynomial time algorithm which determines the lot sizes for purchase component in Material Requirement Planning (MRP) environments with deterministic time-phased demand with zero lead time. In this model, backlog is not permitted, the unit purchasing price is based on the all-units discount system and resale of the excess units is possible at the order...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995